Improved code-based identification scheme 



Pierre-Louis Cayrel 

CASED 
Mornewegstrasse, 32 
64293 Darmstadt 
Germany 
pierre- louis. cayrel @ cased.de 



Pascal Veron 
IMATH 

Universite de Toulon et du Var. 
B.P. 20132, F-83957 La Garde Cedex, 
France 
veron @ univ- tin .fr 



Abstract — We revisit the 3-pass code-based identification scheme 
proposed by Stern at Crypto'93, and give a new 5-pass protocol 
for which the probability of the cheater is « 1/2 (instead of 2/3 
in the original Stern's proposal). Furthermore, we propose to use 
quasi-cyclic construction in order to dramatically reduce the size 
of the public key. 

The proposed scheme is zero-knowledge and relies on an NP- 
complete problem coming from coding theory (namely the q-ary 
Syndrome Decoding problem). Taking into account a recent study 
of a generalization of Stern's information-set-decoding algorithm 
for decoding linear codes over arbitrary finite fields ¥ q , we suggest 
parameters so that the public key be 34Kbits while those of 
Stern's scheme is about 66Kbits. This provides a very practical 
identification (and possibly signature) scheme which is mostly 
attractive for light-weight cryptography. 

I. Introduction 

Cryptosystem based on number-theory (problems of factori- 
sation and discrete logarithm) is more and more widely used 
in the real world. After Shor's algorithm which describes a 
quantum algorithm to solve in polynomial time the two previ- 
ous problems, there is a strong need for public key schemes 
which are not based on such problems. Firstly because it would 
be unreasonable to consider only one type of hard problem. At 
the time, nearly all public key cryptographic products are based 
on integer factorization or discrete logarithm. Secondly, even if 
the above mentioned problems remain hard, practical progress 
in factorization and discrete logarithm computation leads to 
choose larger and larger keys. 

The Shor's algorithm doesn't threaten the so-called post- 
quantum cryptosystems as lattice-based, code-based, and 
multivariate-based cryptosystems. 

In this paper, we consider a particular type of alternative 
cryptography, based on error-correcting code theory. Code- 
based cryptography was initiated a long time ago with the 
celebrated McEliece encryption algorithm. 

Public key identification (ID) protocols allow a party holding 
a secret key to prove its identity to any other entity holding 
the corresponding public key. The minimum security of such 
protocols should be that a passive observer who sees the inter- 
action should not then be able to perform his own interaction 
and successfully impersonate the prover. 

At Crypto'93, Stern proposed a new scheme, which is still 
today the reference in this area 1 1 81 . The Stern's scheme is 
a multiple round zero-knowledge protocol, where each round 



is a three-pass interaction between the prover and the verifier. 
Stern' scheme has two major drawbacks : 

1) many rounds are required because the cheater suc- 
cess probability is 2/3 instead of 1/2 for Fiat-Shamir's 
factorization-based protocol (8), hence typically 27 
rounds are needed so that this probability be less than 
2- 16 , 

2) the public key is very large, typically 66 Kbits. 

The first issue was addressed by Gaborit and Girault in 1101 
and the second one was partially adressed by Veron 1201 . In 
this paper, we focus on the first drawback. Using q-ary codes 
instead of binary ones, we define a 5 pass identification scheme 
for which the probability of a cheater is bounded by 1/2. We 
then propose to use quasi-cyclic construction to address the 
second drawback. 

Organisation of the paper 

In Section [TTJ we give basics facts about code-based cryp- 
tography, we describe the original Stern's scheme and propose 
in Section [III] a new identification scheme which permits to 
reduce the number of rounds involved during the identification 
process. In Section IIVI we describe the properties of our 
proposal and study its security. The Section [V] concludes our 
contribution. 

II. Code-based cryptography 

In this section we recall basic facts about code-based cryp- 
tography. We refer to (4), for a general introduction to these 
problems. 

A. Definitions 

Linear codes are fc-dimensional subspaces of an n- 
dimensional vector space over a finite field F 9 , where k and 
n are positive integers with k < n, and q is a prime power. 
The error-correcting capability of such a code is the maximum 
number t of errors that the code is able to decode. In short, 
linear codes with these parameters are denoted (n, k, t)-codes. 

Definition II.l (Hamming weight) The (Hamming) weight of 
a vector x is the number of non-zero entries. We use Wt(x) to 
represent the Hamming weight of x. 



Definition II.2 (Generator and Parity Check Matrix) Let 

^ be a linear code over ¥ q . A generator matrix G of ^ is a 
matrix whose rows form a basis of c 1o: 

tf = {xG:x£ F h q }. 

A parity check matrix Hof'ta is defined by 

<tf = {x £ F™ : Hx T = 0} 

and generates the dual space of c &. 

We describe here the main hard problems on which are based 
the code-based cryptosy stems. 

Let n and r be two integers such that n > r, Binary(n, r) (resp. 
q — ary(n, r)) be the set of binary (resp. g-ary) matrices with 
n columns and r rows of rank r. Moreover, let us denote by 
x <^ A, the fact that x is randomly selected in the set A. 

Definition II.3 (binary Syndrome Decoding (SD) problem) 

Input : H A Binary (n,r), y A F£ and an integer 

LJ>0. 

Ouput : A word s £ Fj such that Wt(s) < uj, Hs T = y. 

This problem was proven to be NP-complete in 1978 (3J, but 
only for binary codes. 

Definition II.4 (g-ary Syndrome Decoding (gSD) problem) 

Input : H q — ary (n,r), y ¥ q and an integer 

LJ > 0. 

Ouput : A word s £ F™ such that Wt(s) < oj, Hs T = y. 

In 1994, A. Barg proved that this result holds for codes over 
all finite fields JT] in russian]. 

The problems which cryptographic applications rely upon 
can have different numbers of solutions. For example, public 
key encryption schemes usually have exactly one solution, 
while digital signatures often have more than one possible 
solution. For code-based cryptosystems, the uniqueness of 
solutions can be expressed by the Gilbert- Varshamov (GV) 
bound : 

Definition II.5 (g-ary Gilbert- Varshamov bound) 

Let H q (x) be the q-ary entropy function, given by : 

H q (x) = a;log 9 (g - 1) - xlog q x - (1 - aOlog^l - x). 

Suppose < 5 < (q — l)/g. Then there exists an infinite 
sequence of (n,k,d) q-ary linear codes with d/n — 5 and 
rate R — k/n satisfying the inequality : 

R>l-H q (5) Vn. 



B. SD identification schemes 

Stern's scheme is the first practical zero-knowledge 
identification scheme based on the Syndrome Decoding 
problem [18]. The scheme uses a binary (n — k) x n matrix H 
common to all users. If H is chosen randomly, it will provide 
a parity check matrix for a code with asymptotically good 
minimum distance given by the (binary) Gilbert-Varshamov 
(GV) bound. The private key for a user will thus be a word 
s of low weight Wt(s) = u (with H2{lo) k1- k/n), which 
sums up to the syndrome Hs T = y, the public key. By 
Stern's 3-pass zero-knowledge protocol, the secret key holder 
can prove his knowledge of s using two blending factors: 
a permutation and a random vector. However, a dishonest 
prover not knowing s can cheat the verifier in the protocol 
with probability 2/3. Thus, the protocol has to be run several 
times to detect cheating provers. The security of the scheme 
relies on the difficulty of the general decoding problem, that is 
on the difficulty of determining the preimage s of y = Hs T . 
As mentioned in (3), the SD problem, stated in terms of 
generator matrix is also NP-complete since one can go from 
the parity-check to the generator matrix (or vice-versa) in 
polynomial time. In 1201 , the author uses a generator matrix of 
a random linear binary as the public key and defines this way 
a dual version of Stern's scheme in order to obtain, among 
other things, an improvement of the transmission rate. 

Figure [T] sums up the performances of the two 3-pass SD 
identification schemes for a probability of cheating bounded 
by 10~ 6 . The computation complexity is the number of bits 
operation involved by the protocol and the communication 
complexity the number of exchanged bits. 
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Fig. 1. Performances of SD schemes 



C. Attacks 

For SD identification schemes, since the matrix used is 
a random one, the cryptanalyst is faced to the problem of 
decoding a random binary linear code. There are two main 
families of algorithms to solve this problem : Information Set 
Decoding (ISD) and (Generalized) Birthday Algorithm (GBA). 
The Information-Set-Decoding Attack seems to have the lowest 
complexity. 

One tries to recover the k information symbols as follows : 
the first step is to pick k of the n coordinates randomly in the 
hope that all of them are errors free. Try then to recover the 
message by solving a k x k linear system (binary or over ¥ q ). 

In 1151 . the author presents a generalization of Stern's 
information-set-decoding algorithm from 1171 for decoding 
linear codes over arbitrary finite fields ¥ q and analyzes the 



complexity. We will choose our parameters with regards to the 
complexity of this attack. 

III. A NEW IDENTIFICATION SCHEME 
In what follows, we consider an element of F™ as n blocs 
of size flog 2 (<7)1 = N. We represent each element of F 9 as 
iV bits. We first introduce a special transformation that we will 
use in our protocol. 

Definition III.l Let E be a permutation of {1, , n} and 

7 = (71, . . . ,7n) £ F™ such that ^ 0. We define the 

transformation II 7i e OS : 

n 7 , E : W n q — > W n q 

V ^ (7E(1)«S(1), • ■ • ,7E( n )«S(n)) 

Notice that Vq G F„ Vu £ FJ, 

n 7jS (a«) = all 7j s(«) and Wt(n 7?E («)) = Wt(u). 

A. A^ey generation 

Let r = n— fc, the scheme uses a random (r x n) q-ary matrix 
H common to all users. It can be considered as the parity 
check matrix of a random linear (n, k) g-ary code. Without 
loss of generality, we can assume that H is given under the 
form H = (I r \M) where M is a random r x r matrix, since 
it is well known that a Gaussian elimination doesn't change 
the code generated by H. Let n be the security parameter, 
algorithm 1 describes the key generation process. 



Algorithm 1 Key generation(l K ) 

[> Choose n, k, uj and q such that WFisd(i, r, cj, q) > 2 
t> Randomly pick a (r x n) q-ary matrix H. 
t> Randomly pick s£FJ with Wt(s) = w. 

> Compute y such that _ffs T = y. 

> Output : pk = (H, y, w) and sk = s. 



B. Identification 

The secret key holder can prove his knowledge of s using 
two blending factors: the transformation above mentionned and 
a random vector. However, a dishonest prover not knowing s 
can cheat the verifier in the protocol with probability ~ 1/2. 
Thus, the protocol has to be run several times to detect cheating 
provers. The security of the scheme relies on the difficulty 
of the general decoding problem, that is on the difficulty of 
determining the preimage s of y = Hs T . 

This protocol is repeated 8 times. A cheater has, once ci 
and c 2 sent, to be able to answer to 2q possible questions. If 
he is only able to answer to q + 2 questions then he is able to 
find a solution to the problem. Indeed, let us denote by z the 
value sent when b = 1, for ci and C2 fixed, this means that 
there exist a and a' distinct and j3 and /3' such that : 

HH-^pf -ay = HH-%{p') T - a'y 

/3 — az — ft — a z 



Algorithm 2 Identification Scheme 

Private key, sk: seFJ such that Hs T = y and wt(s) = 

UJ. 

Public key, pk: H a (n — k x n) random matrix of rank 
n — k over ¥ q , h a collision resistance hash function, y £ 

F"- fc and u6ff 

> Prover: generates a vector u 6 F™, a vector 7 £ FJ. and 
a permutation E over {1, . . . , n} at random and computes 
the commitments : 

1: Set ci <- h (E ) 7,fl"M T ) 

2: Set c 2 «- fi,(n 7 ,E(u),n 7 ,s(s)) 

3: Sends the commitments {01,02} to the Verifier. 

> Verifier: chooses a random a G ¥ q and sends it to the 
Prover. 

t> Prover: sends Tl~,^(u + as) = f3 £ F™ to the Verifier. 

[> Verifier: sends a challenge b £ {0, 1} to the Prover. 

t> Prover: answers the challenge 
4: if 6 = then reveals E and 7. 
5: else if b = 1 then reveals II 7i s(s). 
6: end if 

t> Verifier: checks commitment correctness 
7: if 6 = then checks if ci = /i(E, 7, OT~^,()3) T - ay) 
is correct 

8: else if b = 1 then checks if C2 = — 

«n 7! j;(s), II 7i e(s)) is correct and if Wt(II 7i E(s)) = u. 
9: end if 



with Wt(z) = uj. Which gives /3 — j3' = (a — a')z and 
Hn-^(/3-P') T = (a-a')j/- We then deduce HU-^(z) T = 
y and wt(n~^(z)) = Wt(z) = oj. 

Hence the success probability of a cheater is bounded by 

9 + 1 
2q ■ 

C. Signature 

By using the so-called Fiat-Shamir paradigm [8|, it is 
theoretically possible to convert this protocol into a signature 
scheme, even if it is practically questionable, since the signa- 
ture is large. 

IV. Properties and security of the scheme 
A. Zero-knowledge 

Let I — (H,y,uj) be the public data shared by the prover 
and the verifier and let P(I, s) be the following predicate : 

P(7, s) = "s is a vector which satisfies Hs T = y, Wt(s) = 
oj" then : 

Proposition IV. 1 The protocol is an interactive proof of 
knowledge for P(I, s). 

Due to the limited size of this paper we don't detail this 
proposition here. 

Theorem IV.l The protocol is a zero-knowledge interactive 
proof for P(I, s) in the random oracle model. 



We will detail this theorem in a longer version of this article. 
However, notice that because of the randomness of u and a 
only random values are exchanged during the protocol. 

B. Security and Parameters 

Like binaries SD identification schemes, security of our 
scheme relies on three properties of random linear q-aiy codes : 

1) Random linear codes satisfy the g-ary Gilbert- Varshamov 
lower bound 1111 : 

2) For large n almost all linear codes lie over the Gilbert- 
Varshamov bound 1161 : 

3) Solving the q-ary syndrome decoding problem for ran- 
dom codes is NP-complete |T). 

Now taking into account the bounds on the workfactor of the 
Information Set Decoding algorithm over ¥ q given in 1141 
(and which generalizes the bounds given in (9|) we have 
to set up parameters in order to obtain a practical scheme 
with a security level greater or equal than 2 80 . We have then 
to choose the number of rounds in order to minimize the 
probability of success of a cheater. 

Since we deal with random codes, we have to select 
parameters with respect to the Gilbert- Varshamov bound (see 
Definition I II. 5b . i.e. choose a weight uj with respect to this 
bound. Moreover as usual we will now suppose k = r — n/2. 
Let N be the number of bits needed to encode an element 
of W q , th the output size of the hash function h, Is (resp. 
ty) the size of the seed used to generate the permutation E 
(resp. the vector 7) and 8 the number of rounds. We have the 
following properties for our scheme : 

Size of the public data in bits : 

k x k x TV + nN(we use the systematic form of H) 

Total number of bits exchanged : 

8(2l h + N + nN + l + (£ s +£ 1 + nN)/2) 

Computation complexity over ¥ q : 

8(k + n + 2wt(s) multiplications + k + Wt(s) additions) 

To obtain a precise complexity on the workfactor of 
ISD algorithms over ¥ q we've used the code developped 
by C. Peters which estimates (using a Markov chain 
implementation) the number of iterations of the best algorithm 
for an attack using all possible known tricks 1151 . ISD 
algorithms depend on a set of parameters and this code allows 
to test which ones can minimize the complexity of the attack. 

We suggest to use for our scheme : 

q = 256, n = 128, k = 64, Wt(s) = 49. 

The complexity of an attack using ISD algorithms is then at 
least 2 87 . For the same security level in SD schemes, we need 
to take n = 700, k = 350, Wt(s) = 75. Table sums up the 
characteristics of our scheme and those of SD schemes for a 



same level of security and a probability of cheating of 2 
We considered that all seeds used are of length 128. 
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Fig. 2. SD schemes vs. g-ary SD scheme 

To obtain a security level of 2 128 suggested parameters are 
q = 256, n = 208, k = 104, Wt(s) = 78 
which gives a scheme with the following properties : 

Rounds : 16 

Public data (bits) : 88192 

Communication (bits) : 46224 

Computation (bytes op.) : 2 12 9 mult. and 2 11 5 add. 

Notice that in 1181 , Stern has proposed two 5 pass variant 
of his scheme : one to minimize the computing load and the 
other one to lower the number of rounds. However, for these 
two variants, the size of the public data and the communication 
complexity are greater than the one of our scheme . A precise 
comparison for the computation complexity will be made in a 
longer version of this paper. 

C. Reducing public key size 

1) Quasi-cyclic construction: In 1101 . the authors propose 
a variation of the Stern identification scheme by using double 
circulant codes. The circulant structure of the public matrix 
makes the computation very easy without having to generate 
the whole binary matrix, indeed the whole scheme only needs 
very few memory storage. They propose a scheme with a public 
key of size 347 bits and a private key of size 694 bits. 

We can use this construction in our context by replacing the 
random g-ary matrix iff by a random g-ary double circulant 
matrix. 

2) Quasi-dyadic construction: We can also imagine a con- 
struction based on quasi-dyadic codes as proposed in 1131 . 

Recently several new structural attacks appeared in 1191 and 
(7) that extract the private key of some parameters of the 
variants presented in Q and 1131 . But in our context we deal 
with random codes and we are threaten by this kind of attacks. 

Furthermore in (6) the authors describe a secure implemen- 
tation of the Stern scheme using quasi-circulant codes. Our 
proposal inherits of the good properties of the original Stem 
scheme face to leakage of information as SPA and DPA attacks. 

The parameters using quasi-cyclic or quasi-dyadic randoms 
codes are q = 256, n = 128, k = 64, Wt(s) = 49 this gives a 
public key of 512 bits and a private key of 1024 bits for almost 
the same complexity for an ISD attack. 



3) Embedding the secret in the matrix: In fact it is possible 
to still decrease the sizes obtained in the previous subsection. 
The idea (from I lOI ') consists in embedding the secret key x 
in the public matrix. To achieve that, we consider the secret as 
a word of the dual code of the code generated by the public 
matrix H. This means that we will use a null syndrom, which 
does not change the zero-knowledge property. We will detail 
this improvement in a longer version of this paper. 

V. Conclusion 

We have defined an identification scheme which among all 
the schemes based on the SD problem has the best parameters 
for the size of the public data as well as for the communication 
complexity. Moreover, we have proposed a variant so as to 
reduce the size of the public data. 

The improvement proposed here to the Stern scheme can 
be applied to all the Stern-based identification and signature 
schemes (as identity-based identification and signature (5) or 
threshold ring signature 1 1 21 for example). 

We believe that this type of scheme is a realistic alternative 
to the usual number theory identification schemes in the 
case of constrained environments such as smart cards and of 
applications such as Pay-TV or vending machines. 
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